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1. INTRODUCTION 


Economic inequality exists in all soeieties or regions beeause of the existenee of gap in ineome and 
wealth among individuals. In order to reduee the gap between the ineome levels of individuals, 
government of eaeh and every eountry devise several eeonomie polieies. Periodie evaluation of 
the effeet of eeonomie polieies in redueing the ineome gap between rieh and poor is important. 
There are several inequality indexes in the eeonomie literature. Allison (?) mentioned that among 
those indiees, Gini inequality index is the most widely used measure beeause it satisfies four basie 
desirable eriteria viz. (i) anonymity, (ii) seale independenee, (iii) population independenee, and 
(iv) Pigou-Dalton transfer prineiple and also Gini index has an easy interpretation and a relation to 
Lorenz eurve. 

The most eelebrated Gini index, as given in Xu (?), is 

A 

Gf{X) = —, where A = E |Xi - X 2 I, /i = E{X) (1.1) 


and Xi & X 2 are two i.i.d. eopies of non-negative random variable X. If there are n randomly 
seleeted individuals with ineomes given by Xi, X 2 ,..., X„, then an estimator of (11.11) is given by 




2X, 


( 1 . 2 ) 


where X„ is the sample mean and A„ is the Gini’s mean differenee (GMD) defined as 


A„ = 


n 


E 

l<2l<Z2<n 


I A,, - X, 


12 \ 


(1.3) 


For eontinuous evaluation of eeonomie polieies implemented by the government, periodie eom- 
putation of Gini index for the whole eountry or a region is very important. One souree from 
whieh Gini index of a region or a eountry ean be ealeulated is using eensus data whieh is typi- 
eally eolleeted every 10 years. But for estimating the Gini index in intermediate years, data from 
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annual household survey eondueted by government ageneies ean be used. For instanee, National 
Sample Survey (NSS) in India, European Statisties on Ineome and Living Conditions in European 
Union and other ageneies eonduet household surveys annually or biennially in respeetive regions 
or eountries. However, many eountries, for example Burundi, Chad, Mozambique (as per world 
bank website), ean not afford or do not eolleet data from households on a relatively large seale 
atmost biennially. 

If household survey data is not available, one has to draw a relatively small sample to estimate 
the Gini index for that region using appropriate sampling teehnique. The sampling teehnique 
should be ehosen depending on the size and soeio-eeonomie diversity of the eountry. Eor a brief 
review of several sampling teehniques, we refer to Coehran (?). In order to eompute Gini index 
for regions or smaller eountries, with lesser soeial diversity, simple random sampling teehnique 
ean be used to eolleet ineome or expenditure data. There exists literature on statistieal inferenee 
for inequality indiees whieh is eomputed from household ineome or expenditure by means of 
simple random sampling from the population of interest (e.g., Gastwirth ?, Beaeh and Davidson, 
?, Davidson and Duelos, ?, Xu, ? and Davidson, ?). In this paper, we will use simple random 
sampling teehnique to eolleet ineome or expenditure data in order to estimate Gini index aeeurately. 

It is well known that error in estimation deereases or in other words aeouraey inereases, when 
the sample size inereases. This in turn inereases the overall eost of sampling. To minimize the 
eost of sampling, one has to reduee the sample size whieh in turn may lead to higher estimation 
error. Thus, a method of estimation should be developed sueh that both the eost of sampling and 
the error in estimation are kept as low as possible. In other words, a proeedure is required whieh 
ean aet as a trade-off between the estimation error and the sampling eost. To aehieve this trade-off, 
fixed-sample methodologies ean not be used, i.e., the sample size should not be fixed in advanee. 
This problem falls in the domain of sequential analysis where it is known as minimum risk point 
estimation problem. Eor more details on the literature of sequential analysis, we refer to Ghosh 
and Sen (?), Ghosh et al. (?), Mukhopadhyay and de Silva (?), and others. 

Unlike fixed-sample proeedures, sequential proeedures do not require sample size to be fixed 
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in advance. Instead, in a sequential proeedure, statistieal analysis is eontinued as the observations 
are eolleeted. Sampling is terminated aeeording to a pre-defined eriterion, also known as stopping 
rule. Sequential sampling allows the estimation proeess to finish early requiring small sample size. 
We are eertainly not the first one to suggest sequential methods in eeonometries. In faet, there are 
several artieles published in several journals in eeonomies and eeonometries whieh pursued the 
idea of using sequential or multi-stage inferenee proeedures. Examples inelude ?, ?, ?, ?, ete. 

Below, we provide a brief literature review of some relevant eoneepts and also our eontribution 
to the literature of statistieal inferenee and eeonomies. 

1.1. Literature Review and Our Contributions 

The estimator of Gini index in (11.21) involves sample mean and Gini’s mean differenee whieh 
belong to a elass of unbiased estimators known as U-statisties. Below, we briefly diseuss the 
literature on U-statisties. 

1.1.1. Literature on U-statistics 

The theory and praetiee of U-statisties began with the pioneering papers of Hoeffding (?, ?). In 
the above papers, Hoeffding derived a general method for obtaining unbiased estimators for a 
parameter 6 assoeiated with an unknown distribution funetion F. Suppose that Xi,..., X„ are 
independent and identically distributed (i.i.d.) random variables from a population with a eommon 
distribution funetion F with an assoeiated parameter 9 = 9{F), 6* G 0 C 7^. Then the U-statistie 
assoeiated with 9 is written as follows 



where denotes the summation over all possible eombinations of indiees (ii,..., im) sueh that 


(rx,m) 


1 < ii < i 2 < ■ ■ ■ < im ^ n, and m < n. Here, is a symmetrie kernel of degree m 

sueh that Ep ...,Xm)] = 9{F) for all F. Thus both GMD and the sample mean are 
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U-Statistics with kernels of degree 2 and 1 respeetively. Detailed literature on U-statisties ean be 
found in standard textbooks such as Hollander and Wolfe (?), Lee (?), and others. 

Apart from being unbiased estimators, U-statistics are reverse martingales with respect to some 
non-increasing filtration as proven in Lee (p. 119,?). We exploit the reverse martingale property of 
U-statistics to derive the asymptotic results in section 3. For more literature on reverse martingales, 
we refer to classical textbooks on probability theory and stochastic processes such as Loeve (?), 
Doob (?), and others. 

As discussed before, we estimate the Gini index by a sequential method known as minimum 
risk point estimation (MRPE). This estimation technique is not new in the literature of sequential 
analysis. Below, we briefly discuss the developments on minimum risk point estimation. 

1.1.2. Literature on MRPE 

Minimum risk point estimation was first introduced by Robbins (?). He suggested a purely se¬ 
quential procedure for estimating mean of a normal distribution. Ghosh and Mukhopadhyay (?) 
generalized this idea to a distribution free scenario and developed a purely sequential procedure 
for minimum risk point estimation of a population mean. Later, Sen and Ghosh (?) extended the 
sequential procedure of Ghosh and Mukhopadhyay (?) to accommodate the minimum risk point 
estimation of any estimable parameter using U-statistics. For more details on MRPE, we refer our 
readers to Sen (?), Ghosh et al. (?), Mukhopadhyay and de Silva (?), and others. 

In minimum risk point estimation problems, a cost function is defined which depends on sample 
size and error in estimation. In this paper, we will use mean square error (MSE) of Gini index as 
an error in estimation. We are interested in finding an estimate of unknown optimal sample size 
which minimizes the asymptotic cost function to estimate Gini index of the population. 

1.1.3. Contributions of this paper 

Several fixed-sample methods are developed for estimation of Gini index assuming that the in¬ 
comes from the sampled individuals are independent and identically distributed (i.i.d.). Examples 
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of such methods can be found in Gastwirth ?, Beaeh and Davidson, ?, Davidson and Duelos, ?, 
Xu, ? and Davidson, ?. However, these methods cannot be used for minimum risk point estimation 
of an inequality index. For a brief overview we refer to ?. In this artiele, we propose a sequential 
proeedure that yields an asymptotie minimum risk point estimator of Gini index by minimizing the 
asymptotie risk funetion defined as a cost function plus a risk term for estimation error. Under very 
mild assumptions, we prove that the estimated final sample size for our proeedure approaches the 
theoretically optimal sample size that minimizes the eost function. Moreover, we prove that the 
expeeted cost for estimating the Gini index using the estimated final sample size is asymptotieally 
elose to theoretieally expeeted eost for estimating the Gini index, that is with theoretieally optimal 
sample size. All theoretieal results are validated by extensive simulation study. 

The remainder of this paper is organized as follows. Seetion 2 develops a purely sequential 
procedure whieh minimizes both the estimation error and the overall sampling cost. Section 3 
presents the theoretieal properties enjoyed by the proposed sequential proeedure. Performance of 
our method is assessed via simulation study in Seetion 4. The next seetion explores the possibility 
of satisfying stronger asymptotie optimality properties. In Seetion 6, we provide some eoneluding 
remarks. The appendix eontains some auxiliary lemmas and detailed proofs of all theoretieal 
results. 


2. SEQUENTIAL METHOD OE ESTIMATION 


Suppose ineomes from n randomly selected individuals are eollected. Let the ineomes of n persons 
be Xi,..., Xn with a eommon but unknown distribution function F. The estimator is a biased 
estimator of the population Gini index, Gp, and E{Gn — is the mean square error (MSB) of 
Gn- The asymptotie expression for MSB of Gn is given by. 

Lemma 2.1. E{Gf - Gnf = ^ + 0 (^), where 


e 




A 




(r-/iA), 


( 2 . 1 ) 
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af = V[E (|Xi - X 2 I 1 Xi)] , t = E (Xi|Xi - X 2 I), and = V{X), 
provided E{Xl^) and exist. 

The proof of the lemma is given in the appendix. If the sample size is large, we reeeive more 
and more information about Gp and, therefore, expeet the squared error loss {Gp — GnY due to 
estimation to be small. However, higher sample size leads to higher sampling eost. Therefore, it is 
desirable to eonsider a loss funetion that takes into aeeount both loss due to error in estimation and 
the sampling eost. Suppose c is the known eost of sampling eaeh observation. Our goal is to find 
an estimation proeedure whieh minimizes both the MSB and also the sampling eost. We define a 
eost funetion depending on the MSB and the eost of sampling, also known as the risk funetion, as 

Rn{Gp) = AE{Gp - Gnf + cn. (2.2) 

Here, A is a known positive constant and is expressed in monetary terms which represents the 
weight assigned by the researchers or analysts regarding the probable cost per unit squared error 
loss due to estimation. Thus, the first term AE{Gp — GnY represents the loss in estimating Gp 
by Gn, and the second term cn represents the cost of sampling n observations. The risk function 
thus gives the expected cost of estimating Gp using the estimator Gn based on incomes from n 
individuals. Using the asymptotic expression of MSB of Gn expressed in (12.11) . the fixed-sample 
size risk defined in (12.21) becomes 

Rn{Gp) = A-—\- cn + O ( . ^ . (2.3) 

Thus, (12.31) gives the expected cost or the risk, to estimate the unknown value of the population 
Gini index using Gn based on n observations. Our goal is to find the sample size for which the 
approximate expected cost (ignoring the O term) defined in (12.31) . i.e., h{n) = A^ + cn is 
minimized for all distributions that satisfy the conditions of lemma l2Tl 

Considering n as a non-negative continuous variable, the strictly convex function h{n) can 
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be minimized at n = nc{^= y ^ • Thus Uc is the required optimal sample size that should be 

eolleeted using simple random sampling from the population in order to minimize the expeeted 
eost to estimate Gp. Thus the approximate expeeted eost of estimating the Gini index using a 
sample of size ric or the asymptotic minimum risk is 


R* (Gp) = —h cUc = 2cnc- 

Ur 


(2.4) 


If the parameter ^ were known in advance, one could simply collect a sample of size ric which is 
the minimum sample size to attain the asymptotic minimum risk. Since ^ is not known, we need 
to collect samples in at least two stages where the first stage is to estimate ^ and Uc based on a 
pilot sample. In fact, Dantzig (?) proved that fixed-sample procedures cannot minimize the risk 
in (12.31) . not even asymptotically. Therefore, we propose a purely sequential procedure that yields 
minimum risk at least asymptotically. 

Since ^ is unknown, we first provide an estimator of ^ that is strongly consistent. The estimator 
of ^ is based on U-statistics and can also be found in Xu (?), and Sproule (?). Proceeding along 
the lines of Sproule (?), let us define a U-statistic, for each j = 1,2,.. . ,n, 

T 

where Tj= ■ 1 < *i< * 2 < n and 11 , 12 ^ j}- Also, define Wjn= nA„—(n — 2)An \ for 

j = 1,. .. ,n, and Wn = n~^ ^jn- According to Sproule (?), a strongly consistent estimator 
of Aaf is 

n 

i=l 

Using Xu (2007), 

^ n(n - 1) \^h-Xi,\ 

^ ' {n,2) 
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is an estimator of r. Let be the sample varianee. Thus, the estimator of is 


IC = 




4X 


■'^n + 


A2 


+ 


_Q ' n I _9 I _9 • 

a: a: 4a: 


(2.5) 


Using Sproule (1969) and theorem 3.2.1 of Sen (1981, p. 50), we eonelude that is a strongly 
eonsistent estimator of 

We outline the purely sequential estimation proeedure of the Gini Index of the population as 
follows: 

Step 1: In the first step, often ealled the pilot sample step, ineomes from a sample of m individuals 
are eolleeted. This sample is ealled the pilot sample. Based on this pilot sample of size m, an 
estimate of obtained by eomputing V^. Cheek the eondition, m > If m < then 

go to the next step. Otherwise, if m > then stop sampling and set the the final sample size 

equal to m. 

Step 2: Obtain ineome from one randomly seleeted individuals. Update the estimate of and 
verify the eondition based on m + 1 observations. If m + 1 > stop further sampling 

and set the final sample size equal to m + 1. If m + 1 < (Kn+i) then eontinue the sampling 
proeess by sampling 1 more individuals and simultaneously update the eondition. 

The sampling proeess is eontinued until the updated eondition is satisfied. 

Formally, we define the stopping rule A, for every c > 0, as 


N = N{c) is the smallest integer n{> m) sueh that n> \j —Vn- 


( 2 . 6 ) 


Here, m is the initial or pilot sample size. In some extreme situations, the estimator 14 may be 
very small whieh may eause our proeedure to stop too early. To avoid this problem, we propose a 
slightly modified stopping rule Ac as 


Ac is the smallest integer n(> m) 3 n > 



(2.7) 
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where 7 G (0, 0.5) is a suitable eonstant. The inelusion of the term n ensures that we do not stop 


too early due to small value of 14 . 

3. THEORETICAL RESULTS 

For a given eost c per observation, the risk or the expeeted cost for estimating the Gini index Gp 
using an estimator based on the final sample size is given by 

RnXGf) = AE{Gf - Gn^ + cE{N,). (3.1) 

Thus, the estimator Gn^ is asymptotically minimum risk point estimator (AMRPE) if the ratio 
regret is asymptotically 1 , i.e., if 


\imRN^{GF)/Rn^{GF) = 1. (3.2) 

c—>0 

In other words, estimator Gn^ is AMRPE (refer Sen, 1981) if the expected cost for estimating 
the Gini index Gp using an estimator based on the final sample size Nc is asymptotically close to 
expected cost for estimating Gp using the optimal sample size, 12^. In decision theoretic frame¬ 
work, the ratio in (13.21) is known as ratio regret which is the ratio between the actual payoff and the 
minimum payoff due to some optimal strategy (Loomes and Sugden, ?). 

Before discussing the asymptotic optimality properties of our method, we prove in the follow¬ 
ing lemma that if observations are collected using (12.71) . sampling will stop at some finite time with 
probability one. 

Lemma 3.1. Under the assumption that ^ < 00 , for any c > 0, the stopping time Nc is finite, i.e., 
P{Nc < 00 ) = 1 . 

Proof of this lemma is given in Appendix. This lemma is very crucial for any sequential 
procedure because it assures that the practitioner will not need to sample indefinitely. Below we 
provide the main theorem related to the asymptotic optimality properties of our procedure. 
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Theorem 3.1. The stopping rule (12.71) yields: 


(i) Nc/ric ^ 1 almost surely as c 

(ii) E{Nc/nc) 1 as c [Asymptotic First-order Efficiency] 

(Hi) Ify G (0, |), Rn^(G f) / Rn^{G p) ^ 1 c 0. [Asymptotic First-order Risk Efficiency] 
provided, E{X^^) and E{X~‘^'^) exist. 

Proof Proof of this theorem is teehnieal and, therefore, it is given in Appendix. □ 

The parts (i) and (ii) of this theorem imply that the final sample size of our proeedure is asymp- 
totieally same as the minimum sample size required to minimize the asymptotie risk defined in 
(12.31) . The part (iii) proves that the risk attained by our proeedure is asymptotieally same as the 
minimum risk. Therefore, the Gini index estimator G^^ is indeed AMRPE. The optimality prop¬ 
erties in part (ii) and (iii) are well known in the sequential literature as asymptotie first-order 
effieieney and asymptotie first-order risk effieieney respeetively (see Mukhopadhyay and de Silva, 
2009). Theorem 13. II also holds for the stopping rule defined in (12.61) . 

4. PERFORMANCE VIA SIMULATIONS 

In this seetion, we evaluate performanee of our estimation strategy for moderate sample size (i.e., 
c is small but not too small) via simulation study. 

To implement the sequential proeedure in (12.61) . we fix c = 0.1, A = 50000, and the pilot sample 
size m = 10. The results in Table 1 and 2 are based on random samples from three income 
distributions: exponential (rate = 5), gamma (shape = 2.649, rate = 0.84), and log-normal (mean 
= 2.185, sd = 0.562). Number of replications used in all Monte carlo simulations is 5000. Table 
1 compares the true values of the parameters r, and ff with their estimated values based on 
the final sample size N. s s (tn), and s (V^) represent the standard errors of the estimators 

AT, Tat, and respectively. 
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Table 1. Estimated sample variances and covariances 


Distribution 

^wN 

s(sIn) 

4cr2 

Tn 

s(rjv) 

T 

1/2 

s(V^) 


Exponential 

0.0521 

0.0002 

0.0532 

0.0596 

0.0001 

0.06000 

0.0843 

0.0002 

0.0833 

Gamma 

3.4172 

0.0157 

3.5036 

7.8110 

0.0147 

7.8205 

0.0463 

0.0001 

0.0468 

Eognormal 

52.11274 

0.1173 

52.8108 

84.9292 

0.0694 

85.2236 

0.0498 

0.00009 

0.0526 


Table 2. Estimated average final sample size and the ratio regret 


Distribution 

N 

s(N) 

Uc 

N/n, 

max(A^) 

Tn 

s(r 7 v) 

TN 

R* 

iX nc 

Exponential 

205.4111 

0.2378 

204.08 

1.0065 

319 

40.9317 

0.0474 

1.0028 

Gamma 

152.19 

0.1970 

152.97 

0.9949 

239 

30.2765 

0.0391 

0.9904 

Eognormal 

162.3504 

0.1483 

163.10 

0.9954 

228 

152.07 

0.1958 

0.9919 
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Table 1 shows that the average values of the estimators are elose to the true values of the 
parameters and, therefore, it indieates that —)> 4af, tjv r and as c | 0. 

Table 2 presents the average final sample size N (estimates E{N)), the maximum sample 
size max(7V) from 5000 replieations, and the average risk Tat (estimates Rn{Gf)) obtained from 
the sample of size N. Moreover, s{N) and s(rAr) represent the standard errors of N and rjv 
respeetively. Table 2 shows that the average sample size N is almost the same as the optimal 
sample size ric. Therefore, on average, our proeedure requires only the minimum sample size Uc- 
The last eolumn of Table 2 illustrates that, on average, the eost for estimating the Gini index Gp 
using an estimator based on the estimated final sample size is asymptotieally elose to expeeted 
eost for estimating Gp using the optimal sample size, ric, or in other words, the ratio regret is very 
elose to 1. This implies that the risk ineurred by our method is almost the same as the minimum 
possible risk defined in (12.41) . Thus, we find that the proposed sequential proeedure performs 
remarkably well for the above mentioned ineome distributions. 

5. EXTENSIONS AND DISCUSSIONS 

5.1. Exploring Asymptotic Second-Order Efficiency 

In sequential point estimation literature, a stopping rule Nc is known as asymptotically second- 
order efficient (see Ghosh and Mukhopadhyay, ?) if the differenee between the expeeted final 
sample size E{Nc) and the theoretieally optimum fixed-sample size Uc is asymptotieally bounded, 

1. e., if E{Nc) —ric is bounded as c 0. Clearly, if a sequential method is seeond-order effieient, it is 
first-order effieient as well. However, the eonverse is not neeessarily true. We explore this seeond- 
order effieieney property via Monte Carlo simulations. Under the same seenario as in Tables 1 and 

2, we apply our method and estimate the differenee E{Nc) — ric based on 500 replieations. We 
repeat this proeess 10 times and present 10 observed values of iV — nc eaeh estimating E{Nc) — ric. 
Table 3 shows that the differenees E{Nc) — ric are quite small for all three distributions. There¬ 
fore, simulation study strongly indieates that the proposed sequential proeedure is asymptotieally 
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Table 3. Estimated Values of E[N^ — n, 


Distribution 

E[Nc] - Uc 

Exponential 

1.9800 

1.1560 

0.7660 

0.6760 

1.5520 

1.5100 

2.600 

0.7020 

1.0380 

1.9020 

Gamma 

-1.178 

-0.834 

-0.98 

-0.438 

-0.274 

-0.686 

-1.338 

-0.282 

-1.2 

-0.626 

Eognormal 

-0.7211 

-0.6611 

-0.9591 

-0.1091 

-0.8331 

-1.2671 

-0.7951 

-1.2211 

0.6771 

-0.2031 


second-order efficient. 


5.2. Exploring Asymptotic Second-Order Risk Efficiency 

In sequential point estimation literature, a stopping rule N^. is known as asymptotically second- 
order risk efficient (see Ghosh and Mukhopadhyay, ?) if the difference regret, i.e., RnS^f) — 
RuciGp) is asymptotically bounded. This property implies asymptotic first-order risk efficiency. 
We explore this second-order risk efficiency property via Monte Carlo simulations. For each of 
the three distributions in Table 4, 10 observed values of — Rn^ are presented, each estimating 
Rnc{Gf) — RuX^f)- Table 4 shows that the differences Rn^{Gf) — RuX^f) are quite small 
for all three distributions. Monte Carlo simulations strongly indicates that the proposed sequential 
procedure is asymptotically second-order risk efficient. 


6. CONCLUDING REMARKS 

The Gini index or Gini concentration is a very popular measure of inequality. It is well known 
that error in estimation of Gini index decreases when the sample size increases. This inflates 
the overall cost of sampling. In order to compute Gini index for a region or a smaller country 
with lesser diversity at a specific point of time, we develop a procedure which computes the final 
sample size needed to minimize both the error of estimation as well as the cost of sampling via 
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Table 4. Estimated Values of Rn^{Gf) — Rn^^Gp) 


Distribution 

tn,- Rn, values 

Exponential 

0.2397 -0.0016 0.1568 0.3799 0.0619 
0.0796 -0.0150 0.1564 -0.0165 0.2302 

Gamma 

-0.3733 -0.3387 -0.1901 -0.4009 -0.3769 
-0.3048 -0.2218 -0.2768 -0.1933 -0.2547 

Eognormal 

-0.3080 -0.2878 -0.2796 -0.2480 -0.2494 
-0.3704 -0.3587 -0.1540 -0.2625 -0.1379 


simple random sampling technique. 

Without assuming any specific distribution for the data, we showed that the average final sample 
size using our procedure approaches the unknown optimal sample size that minimizes the cost 
function. Moreover, we proved that the expected cost for estimating the Gini index using the 
estimated final sample size is asymptotically close to the expected cost for estimating the Gini 
index using the unknown optimal sample size. Thus, based on the results mentioned above, we 
conclude that the proposed sequential estimation strategy is remarkably efficient in reducing both 
sampling cost and estimation error. 

7. APPENDIX: AUXILIARY RESULTS AND PROOES 

7.1. Proof of Lemma IXIl 

Note that 14 is strongly consistent estimator of 4 Therefore, for any fixed c > 0, 

P{Nc > oo) = lim P{Nc > n) 

n—>-oo 

= lim P (n < \/ Ajc (14 + = 0. 

n^oo \ J 

The last equality is obtained since 14 ^ ■C almost surely as n —)■ cx). This completes the proof. 
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7.2. Lemmas to Prove The Main Result 


This section is dedicated to prove some lemmas that are essential to establish the main theorem 
13.11 First, we introduce few notations. Note from (12.71) that Nc > i.e., N, > 

with probability 1. For fixed e, 7 > 0, define 


nic 


2 ( 1 + 7 ) 


= ndl — e), and n^c = nc{l + e), where ric 



(7.1) 


Suppose X („) denotes the n dimensional vector of order statistics from the sample Xi,..., and 
Xn is the cr-algebra generated by (X(„), X„+i,X„+2,.. .)• By Lee (1990), 

{r„,X„}, |Aji, and their convex functions are all reverse submartingales. Using reverse 
submartingale properties of U-statistics, we prove the following maximal inequality for sample 
Gini’s mean difference. 


Lemma 7.1. If nonnegative i.i.d. random variables Xi,..., X„ are from the distribution F such 
that < 00 for some positive integers r and p, then for any k > 0, 


P I max 

nic<n<n2c 


- A" 

n 


> /c^ < as c 10. 


Proof Note that 


A^-A^ 

= 

(a; - A^) /(A„ 

> A) + 

(aJ - A=) /(A„ < A) 


A 

A -1^^)* + 2A 

A„-A 

/(A„< A). 


(7.2) 


Here, the notation x'^ is used to mean max(a:, 0). Therefore, 


P I max 

nic<n<n2c 


A^-A^ 


> k] < P 


max ( A^ - A^ 

nic^n<n2. ' 




( max 

<! 

1 

g 

«! 

\ nic<n<n2c 



> 


4A 


Since ( A? — A^ ) and 


A„-A 


are reverse submartingales, using maximal inequality for reverse 
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submartingales (Ghosh et al. 1997), we write 


P 


( max 

\nic<n<n 2 c 





The last two inequalities are obtained by Cauehy-Sehwarz inequality and lemma 2.2 of Sen and 
Ghosh (1981). The moment eonditions of this lemma are needed to ensure that all expeetaions 
exist in the last three inequalities. □ 


Lemma 7.2. Let be the sample mean based on nonnegative Ltd. observations Xi,..., 
For r > I, E < E (Xf 

Proof. Note that X„ > (11^=1 Ihe observations are nonnegative. Therefore, 




(^>7) < E 

[(Hi) 1 



(7.3) 


The last equality is due to the i.i.d. property of the observations. We know that {E is 

a nondeereasing function of p for p > 0. Applying this result with p = l/n>lin (17.31) . we 
complete the proof. □ 


Lemma 7.3. Suppose that nonnegative i.i.d. random variables Xj,..., X^ are observed from the 
distribution E such that E{XiY^ and E{Xi)~ exist for some positive integers r and 

p. Then, for any A; > 0, 


P ( max 

\nic<n<n2c 






as c 0. 
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Proof. By Taylor expansion of Xj = (l + (X„ — /i)//i) we have 




X„ /iV \Xn P 


1 

P"’ 


.L(x„ - ^)+1 /^ 




2 /i 2 


2 r' 


r+2 


fi 


where z e [1, X„//i]. Since z I{X < /i ^) < 1, proceeding along the lines of (17.21) 


1 


/i'- 




< 


X^ 

1 


P’ 

1 ^ + 


X 


n 

r 


/i 


r+l 


lx. 


p\ “f 


1 

T' /i 

r(r + 1) 




Xn P 


2p 


r+2 


{Xn - /i)^ 


(7.4) 


l.aUin= (=r - , f/2n = ^ |^n “ /ij, and f/sn = (^n “ /i)^- Uslng wc Can 


write 


/ 

1 1 

1 max 

-- - 

\nic<n<n 2 c 

X, /i’' 


> k \ < P ( max Uin > ^] + P ( max U 2 n > ^ 

nic<n<n2c oj \nic<n<n2c O 

k 

P I max U'in > X 

nic<n<n2c O 


(7.5) 


Since — -Pj is a reverse submartingale and f{x) = x+ is a non-decreasing convex function of 
X, Pin is a reverse submartingale. Therefore, using maximal inequality for reverse submartingales 


P ( max Pin > — < I T E 


nic<n<n2c 


< I ^Vpp 




X 


nic 


1 P 


\x: 


•-1 


+ 


/ix: 


•-2 


+ ... + 


/i 


r—1 


I (^nic < A^) 


LV^'ni. "" 


<,fr« 


< 0(n-^/'). (7.6) 

The last two inequalities are obtained by using Cauchy-Schwarz inequality and lemma 2.2 of Sen 


i^nic P) 


Ap 


E 


fiXr 


Ap 


E 


—2p{r-l) 
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and Ghosh (1981). Due to lemma imi existenee of E{Xi) i)} ensures the existenee of 


Xr, 


4p 


and E 


x„ 


2p{r—l) 


. Sinee |X„ — /i| and are reverse submartingales, 


we ean write 


P I max 

nic<n<n2c 




P I max f/ 3 „ > - 1 < 

nic<n<n2c 


2kfi 


(7.7) 

(7.8) 


Apply (17.61) . (17.71) . and (17.81) in (17.51) to eomplete the proof. 


□ 


Lemma 7.4. Suppose that nonnegative Ltd. observations Xi,..., are such that E{Xf') and 
E{X^^^) exist for some r > 1. For any e G (0,1) and 7 > 0, 


(i) P{Nc < nc(l — e)) = O y^ic j = O j as c ), 0, 

(ii) P{Nc > nc(l + e)) = O = O as c I 0. 


Proof. Using the definition of stopping rule in (12.71) and (17.11) . we have 


P{Nc < n 2 c) < P \ n > \j—Vn for some n e [nic, n 2 c] 


< P [y^ < y-jj nl^ for some n e [riic, ri 2 c] 

< P {\V^ - > ^^e(2 - e) for some n e [uic, n2c\) 

<p( max {\V^^\ + \V 2 n\ + \Vsn\ + \V,n\}>e<2-e)], 

\nic<n<n2c 


(7.9) 


where Ui„ = (), U 2 „ = ( jf 77 - ), Usn = f ^ ^ , and = 


4x: 


x: 


x: 


n-2 


wn _ 

4 ^ 'iP 


Let k = ^^e(2 — e). Then, (17.91) ean be written as P{Nc < n 2 c) < Pi + P 2 + P'i + Pa, where 


Pi = P [ max \Vin\> - ] , for i = 1, 2, 3,4. 
nic<n<n2c 4 


First, let us find an upper bound of Pi. Let Ti„ = (), T 2 n = {S^ — cr^), and T^n = 
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— 7^ ) • Note that 

4XZ 


Vln — TinT2nTsn + A^T2„T3„ + a'^TinT^n H-7^1n^2n H-T^ln H-T^Sn + (7.10) 




/i 


/!■ 


Let us eonsider the first term in the summation of (17.101) and state the following inequalities. 


P ( max \TinT2nT3n\ > 

\nic<n<n2c Zo J \nic<n<n2c ' ''X 


28, 
-r/P _ 


< 0(0 + 0{n-,:) + 0(n-:^^) = 0(0 )• (7.11) 


The asymptotie orders in (17.111) are obtained by using lemma iTTl maximal inequality for reverse 
martingales (Lee, p. 112, 1990), lemma 2.2 of Sen and and Ghosh (1981), and lemma iTOl The 
eonditions of lemma l74l are also used in (17.111) . Following the same argument as above, one ean 
show that the aymptotie order of probability of large deviations (as in (17.111) 1 eorresponding to the 
remaining six terms in the summation of (17.101) are either 0(n]“J’) or 0(n)~J'^^). Therefore, 


Px 


max 

nic<r7<n2, 


\Vin\>^ < 0(ni;/^). 


(7.12) 


Note that all the estimators in V 2 n and Lsn are U-statisties as we had in the ease of Vi„. So, follow¬ 
ing similar arguments as in the proof of (17.121) . one ean show that both P 2 and P3 are (9(n7j^^) as 
c 0. 

To work with P4, we note that the expression of l/4„ involves whieh is not a U-statisties. 
Therefore, arguments given in the ease of P1-P3 may not work without additional result. Following 
the proof of lemma 3.1 of Sen and Ghosh (1981) and noting that E {Xf') < 00 for r > 1, 


P 


( 

^wn 2 

max 

\nic<n<n2c 



> P < 0{n 


r;), 


for any positive eonstant K. 


(7.13) 
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Noting that V^n = H^inW^2n+o'fW^2n+/i Win, whcrc Win = - af) and ^ ^ , 


1 1 


P4 < P ( max \WinW 2 n\ > + ^ f \W 2 n\ > ^ ^ l^l^l ^ ^ 

^nic<n<n2c lZ J \nic<n<n2c VZO^J \nic<n<n2c iZ 


< ^P (max^jWinl > + 0(nj/^) + 0(ni;) < 


(7.14) 


The asymptotic orders in (17.141) are obtained by using lemma 17.31 and the inequality in (17.131) . 
We complete the proof of (i) by adding all the upper bounds for P1-P4 and noting that riic = 
O The proof for part (ii) of lemma l74l is very similar to the proof of part (i). □ 

Lemma 7.5. If nonnegative i.i.d. observations Xi,, Xn are such that E{Xf’) and E{Xf'^^°‘) 
exist for some r > 1 and a > 1, then 


e( max ((7„ — (jiT’)'’) = 0{n^f^'^) asclO. 
ynic<n<n2c J 


Proof. Applying Cr inequality, we can write 




<- a: =— +_ a„-a 


Xn dj d 


By Cauchy-Schwarz inequality and lemma 9.2.4 of Ghosh et al. (1997), we have 


2E ( max {Gn — GfY 

' nic<n<n2c 


< -i E { max A‘^ ) E I max ( ^-- 

,nic<n<n2c / \ nic<n<n2c \A„ /i 


1 1 \ ^ 1 


/i'’ \r -1 


(7.15) 


E (A„,^ - A 


< <! P I max A 

nic<n<rx2, 


/i \nic<n<n 2 c J \nic<n<n 2 c ^ 


0{n 


-r/ 2 x 
Ic ) 


The last inequality is obtained by Cauchy-Schwarz inequality and lemma 2.2 of Sen and Ghosh 
(1981). Note that, by lemma 9.2.4 of Ghosh et al. (1997), lemma 2.2 of Sen and Ghosh (1981), 
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and existence of E (Xf^), 


^ - '*)*) s (4^:1)" ^ - X < o(%f). 


E{ max A^M < ' 

nic<n<n 2 c / V 2r — 1 


2r 


max 


nic<n<n2c 


< 1 + 


P 


P(A-J < 00 , and 
1 


P X 


max 


iiic<n<n2c 


4r — 


> t df < 1 + 


-—4ra 

‘nic 


a 


(7.16) 

(7.17) 

(7.18) 


is finite as E (Xf ^™) < cx). The last inequality is due to the maximal inequality for reverse 
submartingales (Lee, p. 112,1990). Using (I7.16I) - (I7.18I) in the upper bound for (17.151) . we complete 
the proof. □ 


Lemma 7.6. IfE{Xf) and E{X^ ") exist for a > 8, then E 
Proof. To prove lemma IT^ it is enough to show that: (i) E 


supK? 

n>m 


< oo for m> A. 


sup sl^X^ 


n>m 


—2 


" 


_____ 

, (ii) P 

sup 

=T In 

^ n 


rL>m 


(ill) P 

A2 

SUp=T 

, and (iv) P 

sup||^2 


_n>m^^ _ 

-| 



have E 


are finite. Following Sen and Ghosh (p. 338, 1981), we 

-4 


sup < oo if P[X"] < oo for a > 4 and m > 4. By (17.181) . E 

n>m 


sup X^ 

n>m 


< oo if 


P[X]^ “] < oo for a > 4. Therefore, 


n>m 


E sup sl^X^ < <^ P sup P sup X 


n>m 


n>m 


1/2 


< OO. 


For (ii), we note that A„ and r„ are U-statistics. Using lemma 9.2.4 of Ghosh et al. (1997), 


P ( sup 

yn>m 


a; 


< 1-1 B 


and P ( sup \f^ 

n>m 




Applying Cauchy-Schwarz inequality twice, 

A„,. 


P I sup 

n>m 


-Tn 


X 


< i P ( sup 

<n>m 


E ( sup \t^\ 

n>m 


E ( sup 

n>m 


X, 


-6 


< OO, 
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if E{Xf) and “) exist for a > 6. Similarly, we ean show that E 


A2 

sup^ 

n>m^n 


< cx> if E{Xf\ 


and E{X^ “) exist for a > 4. Finally, E 
This eompletes the proof. 


«nr. <?2 

n>m^n 


< oo if E{Xl) and E{Xt^ “) exist for a > 8. 

□ 


Lemma 7.7. Let Un be a U-statistics for estimating 9 based on n observations. For any e G (0,1), 


E ( max {Un-Unf)] = O (-^ ] as cl 0. 


Proof. Sinee {Un — ^ reverse martingale, lemma 9.2.4 of Ghosh et al. (1997) yields 


E{ max < (i) B - £/„j" 

.n2c<n<nc / \o/ 


(7.19) 


Let Vn = Un — 9. Using reverse martingale property of 14, i.e., E{Vn 2 ^ I we have 


E (Vn^Vl) = E «), E > E «) , and 

E (Vlyl) < {E (O E «)}’ < E {v:j . 


(7.20) 

(7.21) 


Using (I7.20I) - (I7.21I) and asymptotie form of 4*^^ eentral moment of U-statisties (Sen, p. 55, 1981), 


E ((/„,. - (/„J‘ = E (vy) + E «) - 4B (14„0 - 4B (l/= .K.) + 6^^ (>410 

< 7 {B (C) - -B (O } = O (4 - + 0 (= O ((s') ■ <7.22) 




(17.221) is obtained by noting that n 2 c = Ucfl — e). Henee, the proof is complete. 


□ 


Lemma 7.8. If nonnegative i.i.d. observations Xi,..., are such that E{Xf) and E{X^ ^®") 
exist for a > 1, then for e G (0,1), 


max {Gn - GnJ" 

n2c<n<n3c 


O ( — ) as c 10. 


rir 
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Proof. E 


max {Gn 

n2c<n<nsc. 


GnJ 


2 


< El + E 2 , where E 2 


E 


max {Gn 

T^c<7^<7^3c 




2 


and 


El 


E 


max {Gn 

n2c‘^n<nc 


Gr 


) 


2 


E 


max 

n2c<n<nc 




Ell + Ei2 

4 


(7.23) 


where E^ = E 


max 


A 


n2c<»l<)^c ^nc 

ing Cauehy-Sehwarz inequality thriee, we can write 


and Eio = E 


max (An — Ar 

n2c<n<nc^n, 


• Apply- 


\e 

max {Xn-Xn^Y 

ri^ 

max A® 

1 

n2c'^n<nc ' 

j l 

772c^77<nc 



E 


max 


n2c<n<nc 
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Using lemma [7771 lemma 9.2.4 of Ghosh et al. (1997), (17.181) . and the conditions of lemma (17.81) . 
we conclude that Eu = O {^jnf). Similarly, using Cauehy-Sehwarz inequality and lemma 17^ 
we have E 12 = O {y/ljnf). Therefore, Ei = 0{y/e/nc). Following the same arguments as above, 
one can show that E 2 = O {^fklnf). Hence, lemma IT^ is proved. □ 


7.3. Proof of Theorem l3H] 


The proof of the parts (i) and (ii) are similar to ?. {i) The definition of stopping rule in (12.7!) 
yields 

V)v. < iV, < m + y? (f7v,_i + (A^-1)-^) . (7.24) 



Since Nc ^ 00 a.s. as c 0 and 14 —a.s. as n —)■ cxo, by theorem 2.1 of Gut (?), Vn^ —)■ ^ as.. 
Hence, dividing all sides of (17.24!) by Uc and letting c —)■ 0, we prove Nc/uc —)■ 1 a.s. as c 0. 

(ii) Since Nc> m a.s. and ric > 1, dividing (17.24!) by tic yields 


Nc/ric < m + 


1 

e 


( sup -f (m 
Vc>0 



almost surely. 


(7.25) 
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where E ( supV^r^.i < oo by lemma TTM Sinee Nc/ric —)■ 1 a.s. as c | 0, by the dominated 
Voo / 

eonvergenee theorem, we conclude that lim^o E{Nc/n^ = 1 . 

(mj Weneedtoshowlimi?ArJGir)/-R* {Gp) =\im.{A/2cnc)E — GfY + \^An.E {Nc/nc) = 

1. Thus, it is enough to show that lim(y4/cnc)-E' {Gn^ — Gp)^ = 1, i-e., lim nc-E {Gn^ — Gpf' = 

c4,o c4,o 

Since we know that ricE {Gn^ — Gp)^ = it is sufficient to show that 


limn, [E ((G^, - Gpf - (G„, - Gpf) ] = 0. (7.26) 

c4,o 

Let El = E [{Gn^ — Gp)‘^I{Nc < n 2 c)] - By (17.11) . lemma IT^ and lemma 1731 we have 


ricEi < E 


max {Gn - GpYI{Nc < n2c) 

n\c<n<n2c 


< ricS E 


max (Gn — GpY 

nic<n<n2c 


P{N,<n 2 c)\ = 0 (c"), 


(7.27) 


where h = {1 — 27)/(4 + dy) > 0 using 7 G (0,|). Here, we assume that E{Xl^) and 
i7(2ff^®") exist for a > 1. Following the same arguments as in lemma l73l we can show 
that E {Gn, - GpY = O {n-Y provided E{XY) and E(Xfi®“) exist for a > 1 . Let E 2 = 
E [{Gn, — GpYI{Nc < n 2 c)] - By Cauchy-Schwarz inequality and lemma 1731 we have 


n,E2 < n, {E [{Gn, - G^)^] P(iV, < na,)}^ = O (c^) (7.28) 


provided E{XY) and E{Xi ^^) exist. Therefore, combining (17.271) and (17.281) . we have 


lim n,E [{(Gjv. - Gpf - {Gn, - Gpf} /(iV, < ns,)] = 0. (7.29) 


Using the same arguments as in lemma 1731 one can show that E 
provided E{Xf) and E{X^^^°‘) exist for a > 1. Let E^ = 


max {Gn 

n>n3. 


E [(G;v. 



= G (ng,^) 


GpfI{N, > ng,)]. 
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Cauchy-Schwarz inequality and lemma l74l yields 


ricEs <nc<E 


max {Gn — GpY 

n>nzc 


P(iVc>n3e)V =o(c^) 


Following the same approaeh as in (17.281) . UcE — Gf)‘^I{Nc > n^c)] < O (02+27 


Imi n,E - Gpf - - Gpf] I{N, > ngj] = 0. 


(7.30) 

Thus, 

(7.31) 


Henee, it remains to prove that 

lim ricE [{(Gat, - G,^)^ - (G„, - Gp)^} I{n 2 c < N, < m,)] = 0. (7.32) 

Let W = {(Gj^^-Gpf - (G^^-Gpfj I{n^, <N,< ngj. Note that 


W = {(G^-Gp) + (G„^-G^)} (G^, - 


< 2 


< max 

\^n2c<n<n3c 


IGn-Gp 


{ max 

n2c<n<n3c 


GnJI{n2c < Nc< n^c) 

\Gn - G„j| /(n2c < Nc< rise)- 


Using Cauehy-Sehwarz inequality, lemmaand following the lines of lemma 17. 5 1 

ncE[W] < 2nc I E ( max (G„ — Gi?)^ ] 77 ( max (G„ — G„^)^ 

( \n2c<n<n3c J \n2c<n<n3c 

< 2n J^O K-‘) O = O (6+-). (7,33) 

Sinee (17.331) is true for any e G (0,1), taking limit on both sides of (17.331) as e —)■ 0, (17.321) is 
proved. Henee, the proof of theorem ISTI is eomplete. 
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7.4. Proof of lemma |23] 


By bivariate Taylor expansion of /(A„, 2X„) = around (A, 2/i), 


A^ 

2A. 


A A„-A 


2/i 2/i 


A 

V 


(^n /^) T Rlni 


(7.34) 


where = —2(A„ — A)(X„ — jj)/h‘^ + 4a(A„ — ij)/h‘^, a = A + p(A„ — A), 6 = 2/i + 
p(2A„ - 2/i), and p G (0,1). Let ^ 2 n = (^A„ - ^))> and = 

tE [Rin {Xn — p))- Squaring both sides of (17.341) and taking expeetation, 


A 
,,2 - 


An 

2Xr 


1 T./A N 


A 


4V 2^3 




(7.35) 


2=1 


Using variance and covariance formulas for U-statistics (Lee, 1990), it is simple to show that 
\n^ = _|_ q (n“^) and cof(A„, A„) = — pA). Therefore, it remains to show that 

Yfi=iEin = First, we work on Ein. Note that Rj^ = AWin + lQW 2 n - ISkFsn, 

where lUi^ = ^ - ^)\ and Wsn = § [K - a) (A„ - pf. By 

Cauchy-Schwarz inequality and lemma 2.2 of Sen and Ghosh (1981), 


E \W,nIiXn >p)\< y^E (^( a „ - a )' {Xn - p)"^ = 0(n-^), 

E \W,nl(Xn <p)\<^Ie (^(An - A)" (A„ - /i)"^ ^ (^j f = 
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provided E{Xl) and E{X^ ®) exist. Following the same approaeh, we have 


E 


> A)J(X„ > 


E 

E 

E 


W2nI{K > A)/(X„ < ^^) 

W2nI{K < A)/(X„ > /i) 
W2J{K < A)/(X„ < /i) 


<-B K-W - = 0(n-^), 

< .E f (X„ - /i)" ^ = 0(n-2), 

- V (2X„)V ^ ^ 


provided and i?(Xf ^®) exist. Similarly, we ean show that EiW^n) = 0{n~‘^) provided 

E{X\‘^) and i?(Xfexist. Therefore, Ein = E{R\^) = 0{n~‘^). By Cauehy-Sehwarz inequality 
and lemma 2.2 of Sen and Ghosh (1981), we obtain E 2 n = 0(n“^/^) and E^n = 0(n“^/^). Henee, 
lemma [2T] is proved. 
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